-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBG-3271 allow in memory buckets to persist #16
Conversation
da6d24d
to
cafb640
Compare
- in memory buckets will now live for the lifecycle of the program until Bucket.CloseAndDelete is called. This facilitates bucket closing without removing the data, which is a common use in Sync Gateway tests that use persistent config to update database configuration. - When a bucket is first created, a copy is stored in the bucket registry, and this is never closed until: - in memory: CloseAndDelete is called - on disk: Close is called to bring refcount of buckets to 0 - force bucket name to be defined, and do not let multiple copies of a persistent bucket to be opened if they are different paths. Note, this is a blunt instrument, and it is indiscriminate to path manipulations. It does path matching on lexography, not normalized paths. The idea is to be safer. Sync Gateway is not architected to support multiple buckets with the same name that do not have the same backing data. Implementation: The global state of a bucket is representative of two things: - *sql.DB represnting the underlying sqlite connection - dcpFeeds for each connection that exists The sql.DB connection can be opened multiple times on the same path, and this was the original implementation of rosmar. However, it can't be opened multiple times for in memory files except via cache=shared query parameter to sqlite3_open. This ended up causing behavior that I didn't understand, and is not typically a supported in sqlite, since multiple databases are managed with a WAL when on disk. DCP feeds in rosmar work by having pushing events on a CUD operation to a queue, which can be read by any running feeds. Instead of having separate feeds for each copy of Bucket and publishing them via `bucketsAtUrl`, we now only have a single canonical set of bucket feeds. Moved this field from a Collection to Bucket. Whether a bucket is open or closed is controlled by Bucket._db() that is called by any CRUD operations. A Collection has a pointer to its parent bucket. Each Bucket opened now will create a Collection dynamically, but these share pointers to the cached in memory versions.
bucket-registry.go
Outdated
} | ||
// Copy the slice before mutating, in case a client is iterating it: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whereabouts do we iterate over the slice without holding the lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was carried over from the original implementation, I've simplified it for refcounting.
bucket_api.go
Outdated
bucket.closed = true | ||
} | ||
|
||
func (bucket *Bucket) _closeAllInstances() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the 'AllInstances' here, since it looks like it's just closing this single bucket's connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed this closeSQliteDB
although it really closes more than that. If this function is called on any bucket, it will shut down all Bucket that have a matching name. Any suggestion of a better name to reflect would be welcome.
collections: make(map[sgbucket.DataStoreNameImpl]*Collection), | ||
collectionFeeds: make(map[sgbucket.DataStoreNameImpl][]*dcpFeed), | ||
mutex: &sync.Mutex{}, | ||
nextExp: new(uint32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this usage - what's the advantage of setting new(uint32) instead of leaving nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just initializing a an int pointer to 0, is there a more canonical way without writing a function?
bucket-registry.go
Outdated
var bucketRegistry = map[string][]*Bucket{} // Maps URL to slice of Buckets at that URL | ||
var bucketRegistryMutex sync.Mutex // Thread-safe access to bucketRegistry | ||
type bucketRegistry struct { | ||
byURL map[string][]*Bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a []*Bucket
, or is it just used for ref counting and so could be something like []bool
or []struct{}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced this with integer refcounting.
bucket-registry.go
Outdated
var bucketRegistryMutex sync.Mutex // Thread-safe access to bucketRegistry | ||
type bucketRegistry struct { | ||
byURL map[string][]*Bucket | ||
inMemoryRef map[string]*Bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed the names are a bit confusing if 'byURL' is actually the ref count, and 'inMemoryRef' is the lookup of bucket by URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked names of bucketCount
for refcount and buckets
for the actual cached copy of the bucket.
- create a copy of the bucket in getCachedBucket so caller doesn't have to remember to make a copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. One suggestion for additional documentation on the bucket registry as discussed, but otherwise looks good.
bucket-registry.go
Outdated
// `realpath` or `fcntl(F_GETPATH)`) and use the canonical path as the key. | ||
// Unfortunately Go doesn't seem to have an API for that. | ||
// * In Memory bucket: bucket is not deleted until any Bucket's CloseAndDelete is closed. | ||
// * On disk bucket: bucket is deleted from registry when all there are no open copies of the bucket in memory. Unlike in memory bucket, the bucket will stay persisted on disk to be reopened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, a bit more context on how buckets are managed here will make this easier to maintain,. In particular I'm thinking of explicitly documenting:
- what's shared between bucket copies (almost everything)
- what's not shared, and why (in particular 'closed')
Implementation:
The global state of a bucket is representative of two things:
The sql.DB connection can be opened multiple times on the same path, and this was the original implementation of rosmar. However, it can't be opened multiple times for in memory files except via cache=shared query parameter to sqlite3_open. This ended up causing behavior that I didn't understand, and is not typically a supported in sqlite, since multiple databases are managed with a WAL when on disk.
DCP feeds in rosmar work by having pushing events on a CUD operation to a queue, which can be read by any running feeds. Instead of having separate feeds for each copy of Bucket and publishing them via
bucketsAtUrl
, we now only have a single canonical set of bucket feeds. Moved this field from a Collection to Bucket. This addresses https://issues.couchbase.com/browse/CBG-3540Whether a bucket is open or closed is controlled by Bucket._db() that is called by any CRUD operations. A Collection has a pointer to its parent bucket. Each Bucket opened now will create a Collection dynamically, but these share pointers to the cached in memory versions.
Depends on couchbase/sg-bucket#110